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System and Method for Register Renaming 



I Cross-Reference to Related Applications 

The following are related, commonly owned, co-pending applications: 
Superscalar RISC Instruction Scheduling, Serial No. 07/860,719, filed March 
31, 1992; 

Semiconductor Floor Plan and Method for a Register Renaming Circuit, Serial 
No, 07/860,718, filed March 31, 1992; 

System and Method for Retiring in a Superscalar Microprocessor, Serial No. 
07/877,451, filed 5/15/92; 

High Performance RISC Microprocessor Architecture, Serial No. 07/817,810, 
filed 1/8/92; 

Extensible RISC Microprocessor Architecture, Serial No. 07/817,809, filed 
1/8/92. 

The above cited patent documents are incorporated herein by reference. 

Background of the Invention 

1. Field of the Invention 

The present invention relates to superscalar reduced instruction set 
computers (RISC), more particularly, the present invention relates to a register 
5 renaming circuit for superscalar RISC computers. 

2. Related Art 

A more detailed description of some of the basic concepts discussed in this 
application is found in a number of references, including Mike Johnson, Superscalar 
Microprocessor Design (Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1991) 

10 (hereafter Johnson); John L. Hennessy et al., Computer Architecture - A Quantitative 
Approach (Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990). 
Johnson's text, particularly Chapters 2, 6 and 7 provide an excellent discussion of the 
register renaming issues addressed by the present invention. 
- A major consideration in a superscalar RISC processor is to how to execute 

1 5 multiple instructions in parallel and out-of-order, without incurring data errors due to 
dependencies inherent in such execution. Data dependency checking, register 
renaming and instruction scheduling are integral aspects of the solution. A detailed 
discussion of storage conflicts, register renaming and dependency mechanisms is 
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found in commonly owned, co-pending U.S. patent application Ser. No. 07/860,719, 
filed March 31, 1992, (hereafter referred to as the r 719 application). 

The '719 application discloses a register renaming circuit (KRC) having a fixed 
instruction window. In the fixed instruction window, dependency checking and 
5 resource assignment is done on the same set of instructions until all the instructions 
in the set are completed. This means that there is a fixed relationship between each 
instructions position in the instruction stream and the instruction number used for 
resource assignment. 

For example, in an 4 instruction stream with an instruction window of 4 

1 0 instructions every fourth instruction mapps to instruction A (i.e., the first instruction 
in the window). This technique makes re-mapping of instruction very simple. In this 
case a 4-to-l multiplexer is all that is necessary for each resource to forward a single 
instruction to that resource. However, the fixed format requires that the instruction 
window be advanced by a fixed amount, which results in somewhat inefficient 

15 processing. 

When an instruction retires (an instruction can retire after it has been 
executed without exception and when all previous instructions have been executed 
and their results are stored in the register file), its result is moved into a main register 
file (i.e., the programmable visible register file) and if any instructions were dependent 

20 on that instruction, their renamed sources are not needed anymore. In the 
architecture disclosed in the '719 application, all instructions 1 sources are renamed 
every cycle. This renaming technique requires many comparators for performing the 
dependency checks. More specifically, the source register addresses of each 
instruction must be compared to the destination register addresses of all preceding 

25 instructions in the instruction window every cycle. 

What is desired is a more efficient register renaming technique requiring less 
comparators and permitting the processor to execute instructions in parallel and out 
of order. 

Summary of the Invention 

30 The present invention is directed to a system and method for performing 

register renaming of source registers on a per-cycle basis only for new instructions 
added to the instruction window in that cycle. The present invention thus reduces the 
total number of dependency check comparators necessary for performing register, 
renaming. 

35 A preferred embodiment of the present invention comprises storing the 

instructions in a variable advance instruction window, and assigning a tag to each 
instruction in the instruction window. The tag of each retired instruction is assigned 
to the next new instruction to be added to the instruction window. The results of 
instructions executed by the processor are stored in a temp buffer according to their 
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corresponding- tags to avoid output dependencie and anti-dependencies. The temp 
buffer therefore permits the processor to execute instructions out of order and in 
parallel. 

Data dependency checks are performed only for each new instruction added to 
5 the instruction window. Operands of the instructions having input dependencies are 
often located in the temporary buffer, and the source register addresses of those 
instructions having dependencies are renamed according to the tags of the operands 
located in the temp buffer. The renamed source register addresses are then stored in 
a rename result register file. 
10 The foregoing and other features and advantages of the present invention will 

be apparent from the following more particular description of the preferred 
embodiments of the invention, as illustrated in the accompanying drawings. 

Brief Description of the Drawings 

The invention will be better understood if reference is made to the 
1 5 accompanying drawings. A brief description of the drawings is as follows: 

FIG. 1 shows a representative block diagram of a DDC equal compare circuit 
of the present invention. 

FIG. 2 shows a representative block diagram of an N-l input priority encoder 
of the present invention. 
20 FIG. 3 shows a representative block diagram of the tag assignment logic 

(TAL) of the present invention. 

FIG. 4 shows a representative block diagram of the TAL and priority encoder 
circuit of the present invention, 

FIG's. 5 A and 5B show representative block diagrams of the register rename 
25 block of the present invention. 

FIG. 6 shows a representative block diagram of the register rename register 
file of the present invention. 

FIG. 7 shows a representative block diagram of the interconnection of the 
blocks of FIG's. 5B and 6. 
30 FIG. 8 shows a representative high level block diagram including the RRC of 

the present invention. 

FIG. 9 shows a representative block diagram of a circuit to generate the 
address for one register file port. 

Detailed Description of the Invention 

35 The terms processor, CPU, and digital processor are often used 

interchangeably in this field. The term "processor" is used hereafter with the 
understanding that other similar terms could be substituted therefore without 
changing the underlying meaning of this disclosure. 
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The present invention is directed to a Register Renaming Circuit (RRC) which 
is part of a processor. The RRC permits the processor to execute instructions in 
parallel and out of order. In a preferred embodiment of the present invention, the 
processor has a variable advance instruction window (VAIW) for holding instructions 
5 from an instruction stream prior to execution. The RRC can be used with a fixed 
advance instruction window as well. 

The VAIW in a preferred embodiment holds eight instructions, and up to four 
new instructions can be added to the top four locations of the VAIW in any one cycle. 

In a VAIW, any one of instructions 10, II, 12 and 13 can be mapped into the 

if* 

10 first location in the window (location A, for example)- Tags are assigned to the 
instructions as the instructions enter the VAIW. The tags are stored in a first-in- 
first-out buffer (hereafter called a FIFO; not shown). 

As an instruction advances in the VAIW by a variable amount, the tag 
associated with that instruction also advances in the FIFO by the same amount. 

15 When a new instruction enters the VAIW it is assigned the tag of the most recent 
instruction to leave the VAIW, thus tags are reused. Instructions can leave the 
VAIW by either retiring, or they can be flushed out if a branch is taken. 

The tag of each instruction leaving the instruction window is returned to the 
head of the FIFO and re-used by the new instruction added to the window. However, 

20 the first instruction and tag in the FIFO always progress in order, because 
instructions always retire in order. 

According to the present invention only new instructions in the VAIW need be 
checked for dependencies. This eliminates the need for excess comparators. New 
instructions in the window are therefore passed through the RRC. In order to reduce 

25 complexity by renaming the source registers for only those instructions that are new 
in the instruction window on a per-cycle basis, two assumptions are made: 

1. Each instruction is's tag remains constant as long as the instruction 
remains in the window. This tag is also associated with the location in a 
temp buffer (discussed below) that the corresponding instruction's 
30 output will be stored. 

2. At most, only a subset (I n to I n -l) of the instructions in the 
window (I n to Iq) can be new in any given cycle. 

In a preferred embodiment of the present invention, the temp buffer (or 
temporary buffer) is part of the main register file. The register file contains- 40 
35 registers; registers 0-31 are the main registers (commonly called the programmer 
visible registers), and registers 32-39 comprise the temp buffer. The temp buffer and 
main register file share the same read ports. Thus, to read the data from temp buffer 
address 4, {100100} would be the address on the read address port, for example. 
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In order to perform dependency checking in the present invention, an RRC 
Inputs Register File (RIRF) is used to hold the source and destination register 
addresses of all instructions in the instruction window. As new instructions enter the 
window an instruction fetch unit (IFU: not shown) sends the instructions' source and 
5 destination register addresses to the RIRF. The source and destination register 
addresses are stored in the RIRF by tag number. The RIRF has one output for each 
instruction in the window and the source and destination register addresses are read 
from the RIRF and sent to the RRC. 

The RRC. performs the data dependency checking and the register renaming. 
1 0 The data dependency check is done by the Data Dependency Checker (DDC) and the 
. rename is performed by the Tag Assignment Logic (TAL). 

There are three kinds of data dependencies: input dependencies, anti- 
dependencies and output dependencies. An instruction is input dependent on a 
previous instruction if one of its inputs is the previous instruction's output; an 
1 5 instruction is anti-dependent if the address of one of its source registers (RS) is the 
same as the address of a later instruction's destination register (RD); and an 
instruction is output dependent if the address of its destination register is the same 
as the address of another instruct- n's destination register. 

Dependencies limit the amount of parallelism that a computer can exploit. For 
20 example, if instruction A is input dependent on instruction B, then instruction A must 
not be executed until after instruction B is executed. Also, if instruction A and B are 
output dependent and instruction A comes first in the program, then instruction A's 
result must be written to the main register file before instruction B's. Finally, if 
instruction A is anti-dependent on instruction B, then instruction B's result must not 
25 be written to the main register file until after instruction A has begun executing. 
Output and anti- dependencies are usually avoided by use of the temp buffer. Input 
dependencies cannot be avoided and are located by the DDC. 

The DDC locates input dependencies by comparing the register file addresses 
of each instruction's sources with the register file addresses of each previous 
30 instruction's destination. If am instruction's input data comes from the same register 
file address as a previous instruction's output data, then they are dependent (the 
term "dependent" will be used to mean 'Input dependent" for the remainder of this 
description). 

It is possible that an instruction can be dependent on several previous 
35 instructions. When this happens, the RRC assumes that the programmer intended 
that the instruction be dependent on the most previous instruction. For example, if 
instruction 5 depends on instructions 3 and 1, then the RRC would assume that the 
programmer intended instruction 5 to use instruction 3's results and not instruction 
Is. 

40 A DDC equal compare circuit 100 for checking dependencies between 
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instructions A and B in accordance with the present invention is shown in FIG. 1. 
The output (A=B?) of the DDC equal compare circuit 100 is sent to a priority encoder. 
A n-1 input priority encoder 200 is shown in FIG. 2. Priority encoder 200 checks the 
highest priority dependency for instruction n's source (I n RS). The inputs at the top 

5 of priority encoder 200 are the data dependency comparisons of instruction n's source 
(I n RS) with the destinations of all previous instructions (I n _iRD-I()RD), as 
determined by equal compare circuits 202. For example, if the x^h bit of the priority" " 
encoder output is asserted, then instruction n is input dependent on instruction x. 

The present invention avoids the problems caused by output and anti- 
10 dependencies by storing the results of all instructions in the temp buffer and then 
moving the results into the main register file in program order. For example, if 
instruction 1 finishes before instruction 0, its result will be written to the register file 
after instruction 0's result is written to the register file. The use of the temp buffer 
allows the processor to execute instructions out of order and in parallel. Since the 
15 results of the instructions are moved to the main register file in order, output and 
anti- dependencies do not cause a problem. 

Since the result of an instruction can sit in the temp buffer for a long time 
before it gets moved to the main register file, instructions that are dependent on that 
instruction will also have to wait. In order to improve performance, the present 
20 invention includes a means with which to use data that is in the temp buffer. That 
means is the Tag Assignment Logic (TAL). 

The TAL determines the location in the temp buffer of the operands of 
dependent instructions. As noted above all instructions are assigned a tag that 
remains constant while the instruction is in the window, and there is one location in 
25 the temp buffer for each instruction in the window. Thus, the processor 
implementing the present invention uses the tag of an instruction as the temp buffer 
address of that instruction's result. 

Since the TAL knows where every instruction's result is stored, and since it 
also knows (from the DDC) where the dependencies are between instructions, the 
30 TAL can determine the location in the temp buffer of each instruction's inputs. 

A representative block diagram of a TAL 300 used to determine the location of 
instruction n's source (RS) is shown in FIG. 3. The outputs of the priority encoder are 
connected as select lines (as shown generally at 302) to select the I n _i through Iq 
TAGs, which are input at the top of TAL 300. TAL 300 thus outputs the temp buffer- 
35 address of instruction n's source. 

A complete rename circuit for instruction n's source register is shown in FIG: \ 
4. The term for the renamed register file address of instruction n's source is 
INRS_TAG. 

A representative block diagram of a rename circuit 500 of the present 
40 embodiment is shown in FIG. 5A. The address of a new instruction s source register 
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(I n RS) is shown input at the top of the rename circuit 500. The destination register 
addresses of all preceding instructions in the window are input to rename circuit 500, 
as shown generally at 502. In addition, all the tags of all preceding instructions in the 
window are input to rename circuit 500, as shown generally at 504. Rename circuit 
5 500 outputs a tag for the new instruction's source register (I n RS), as shown at 506. 
The new I n RS tag is assigned by rename circuit 500 according to any dependencies, 
as discussed above in connection with the other embodiment(s). If the instruction 
has no dependencies the address input at the top input is simply passed to the 
output^ Given a VAIW of 8 instructions and assuming that the temp buffers 116 
10 have the 8 highest addresses of the 40 total registers, the most significant bit of the 
rename circuit 500 output indicates whether the result is in the main register file or 
the temp buffer. 

Renaming circuits 508 for renaming the source registers of a plurality of 
instructions are shown in FIG. 5B. The number "i" in FIG. 5B corresponds to the 

1 5 maximum number of instructions in the window that can be new. In a preferred 
embodiment "i" is equal to 3; thus requiring 4 rename circuits. 

Because renaming only happens when a new instruction enters the window, 
some technique is needed to store the outputs of the RRC. One technique would be to 
store the RRC results in a FIFO whenever the instruction window advances. The 

20 present embodiment, however, stores the renamed registers in a separate register 
file (discussed below). Since each instruction's tag stays constant, the renamed 
source register results from the rename circuits can be stored by tag number. The 
register file for storing the rename results therefore has one write port for each new 
instruction in the window and one read port for each instruction in the window. 

25 A representative block diagram of a rename result register file (RRRF) 600 is 

shown in FIG. 6. The renamed results are input as "WRITE DATA", as shown 
generally at 601. "READ ADDRESS" and "WRITE ADDRESS" tags are input to 
register file 600, as shown generally at 602 and 604, respectively. Renamed results 
for all the sources„of all instructions in the window are available as "READ DATA", 

30 as shown generally at 606. FIG. 7 shows rename circuits 508 connected to rename 
result register file 600. 

When an instruction retires its result is moved into the main register file. If 
any instructions were dependent on that instruction, their renamed sources are not 
needed anymore. 

35 The area to which new instructions can enter the instruction window (in this 

embodiment the area is the top four locations), are those locations which are register 
renamed. Once an instruction leases that area of the window it is no longer renamed. 
The . RRC of the present invention renames an instruction's source register when it 
enters the window, so there needs to be a mechanism to detect which instructions' 

40 sources have been moved to the register file and to replace the renamed source 
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register address with the original source register address. The first part of that 
mechanism is called MONDEP (as in "monitor dependencies") and the second part is 
called RFMXING. In addition, a source register ready generator (RDY_GEN) is used 
to determine when each instruction's sources are available. 
5 A representative high level block diagram of the RIRF, RRC, RRRF, 

MONDEP, RDY_GEN and RFMXING (labeled as 802, 804, 600, 806, 808 and 810, 
respectively) is shown in FIG. 8. Each block 802, 804, 600, 806, 808 and 810 - 
receives the tags of all instructions in the instruction window from the tag FIFO (not 
shown). Implementation of the tag FIFO will become obvious to one skilled in thig art* " 

10 Source and destination register addresses of new instructions from the IFU 

(not shown) are serit to RIRF 802 via a bus 812, and are accessed by RRC 804 via a 
bus 814. The source registers of all instructions are passed to RFMXING 806 via a 
bus 816. Renamed source registers of all instructions are stored in RRRF 600 via a 
bus 818. The stored renamed source registers of all instructions are passed to 

1 5 RFMXING 806, MONDEP 808 and RDY_GEN 810 via a bus 820. 

MONDEP 808 determines which dependencies have disappeared by 
comparing the tags of retiring or recently-retired instructions with the lower three 
bits of the renamed sources of each instruction. Information regarding retired 
instructons is sent to MONDEP 808 via a bus 828 from a retirement unit (not 

20 shown; the details of a retirement unit that can be used to generate these signals is 
disclosed in co-pending, commonly owned patent application Ser. No. 07/877,451, filed 
5/15/92). If there is a match, then MONDEP 808 knows that the dependency has 
been removed and the outputs of MONDEP 808 outputs which instructions' inputs 
have been moved from the temp buffer to the register file. These outputs signals are 

25 sent to RFMXING 806 and RDY_GEN 810 via buses 822. 

In a preferred embodiment of the present invention, the instruction window 
holds eight instructions. Each cycle, at most three of those instructions can be 
retired. In the cycle after an instruction is retired, its tag is moved to the top of the 
FIFO. Therefore, to check what dependencies have been removed, MONDEP 808 

30 compares each of the renamed sources of each instruction with the tags of the top 
three instructions in the FIFO. In a further embodiment MONDEP 808 can compare 
each renamed source with the tags of the instructions at the bottom of the FIFO 
that are about to be retired. 

MONDEP 808 outputs a bit for each source of each instruction and the .bits 

35 are sent to RFMXING and RDY_GEN blocks in the RRC. These bits are asserted 
when an instruction's dependency goes away and remain asserted until -the 
instruction is retired. 

RDY_GEN 810 determines when each instruction's sources are available and 
outputs this information via a bus 830. The difference between MONDEP 808 and 

40 RDY_GEN 810 is MONDEP 808 only monitors when instructions retire. An 
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instruction does not have to wait until another instruction retires to use its result; it 
only needs to wait until it is done (an instruction is done when its result is entered into 
the temp buffer). Also, if an instruction has no dependencies, then it can be executed 
immediately- Information concerning whether an instruction is "done'* is input to 
5 RDY_GEN 810 via a bus 832. "Done" signals come from done control logic (not 
shown). In connection with the present invention, the term "done" means the result 
of the instruction is in a temporary buffer or otherwise available at the output of a 
functional unit. (An example of done control logic may be found the 719 application.) 
RDY GEN 810 has one output for each source of all instructions in the 
10 window. The output for a particular instruction's source is assured if one of three 
conditions is true: 

The source was never dependent on any other instruction. 

The instruction that the source was dependent on is done and its result 

is in the temp buffer. 

15 The instruction that the source was dependent on is retired and its 

result has been moved from the temp buffer to the register file. 
These outputs 830 of RDY_GEN 810 go to the ISSUER which determines 
which instruction(s) is to issued based on functional unit availability and lack of 
dependencies. 

20 RFMXING 806 is used to generate the read addresses of the register files. It 

contains a collection of muxes for each read port of each register file. These muxes 
are selected by the outputs of the ISSUER and MONDEP 808. Read addresses for 
each port of each register file are output by RFMXING 806, via a bus 824. (A 
processor may have a separate register file for a floating point subprocessor and an 

25 integer subprocessor, for example.) 

The circuit to generate the address for one register file port is shown in FIG. 9. 
The ISSUER decides which instructions to execute and which register file ports to 
use for each instruction by sending select signals via a bus 826 to RFMXING 806. 
MONDEP 808 decides which instructions sources have been moved from the register 

30 file and which are still inside the temp buffer via bus 822. For example, if one 
assumes that the ISSUER decides to execute instruction I n and I n is dependent on 
I n _l ; then the ISSUER will select (via select signals 826) two top multiplexers 
(mux) 902 and 904 and choose I n RS (I n 's original source register address) and 
I n RSJTAG. If it has, MONDEP 808 will select, using a third mux 906, the output of 

35 mux 902 on the left and send I n RS to the register file. If not, it will choose the output 
of mux 904 on the right and send the I n RS_TAG to the register file. 

While various embodiments of the present invention have been described 
above, it should be understood that they have been presented by way of example, and 
not limitation. Thus the breadth and scope of the present invention should not be 

40 limited by any of the above-described exemplary embodiments, but should be defined 
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only in accordance with the following claims and their equivalents. All cited patent 
documents and publications in the above description are incorporated herein by 
reference. 
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Claims 

What is claimed is: 

1. A system for performing register renaming of source registers in a processor 
having an instruction window for storing a group of instructions to be executed by the 
processor, wherein new instructions are added to the instruction window when the 
processor retires preceding instructions, the system comprising: 

5 first means for storing source and destination register addresses for the 

instructions in the instruction window; 

^ 'second means, coupled to said first means, for accessing said stored 
source and destination register addresses for performing a data dependency 
check for each new instruction added to the instruction window; and 
1 o third means, coupled to said second means, for renaming source register 

addresses for instructions having dependencies as determined by said second 
means. 

2. The system of claim 1, wherein the system further comprises a rename result 
register file for storing said renamed source register addresses. 

15 3. The system of claim 1, wherein the instruction window is a variable advance 
instruction window. 

4. The system of claim 3, wherein instructions in the variable advance 
instruction window are assigned a tag, and- the tag of an instruction leaving the 
window is assigned to the next new instruction to be added to the variable advance 

20 instruction window. 

5. The system of claim 1, wherein said dependencies are input dependencies. 

6. The system of claim 1, wherein: 

said second means determines whether more than one dependency 

exists; and 

25 a priority encoder, coupled to said second and third means, which selects 

a highest priority dependency identified by said second means and passes said 
highest priority dependency to said third means. 

7. The system of claim 6, wherein said system further comprises a temp buffer 
means for storing results of instructions executed by the processor according to said 

30 tags to avoid output and anti-dependencies, wherein said temp buffer permits the 
processor to execute instructions out of order and in parallel. 
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8. The system of claim 7, wherein said third means comprises tag assignment 
logic for determining where in said temp buffer operands of dependent instructions 
are located according to said highest priority dependency. 

9. The system of claim 8, wherein said system further comprises means for 
5 passing the results stored in said temp buffer to a main register file in program order. 

10. A method for performing register renaming of source registers in a processor 
haviiig a variable advance instruction window for storing a group of instructions to be 
executed by the processor, wherein new instructions are added to the variable 
advance instruction window when a location becomes available therein, the method 

1 0 . comprising the steps of: 

storing source and destination register addresses for the instructions in 
the variable advance instruction window; 

assigning a tag to each instruction in the variable advance instruction 
window, wherein the tag of each retired instruction is assigned to the next new 
1 5 instruction to be added to the variable advance instruction window; 

storing, in a temp buffer, results of instructions executed by the 
processor according to their corresponding tags to avoid output and anti- 
dependencies, said temp buffer permitting the processor to execute 
instructions out of order and in parallel; 
20 performing data dependency checks for input dependencies for each 

new instruction added to the variable advance instruction window; 

determining where operands are located in the temp buffer for the 
instructions having input dependencies as determined by step (d); 

renaming source register addresses of the instructions having 
25 dependencies; and 

storing said renamed source register addresses in a rename result 
register file. 

11. The method of claim 10, further comprises the step of passing the results 
stored in the temp buffer to a main register file in program order. 
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